A Gamma mixture model better accounts for among site rate heterogeneity
نویسندگان
چکیده
MOTIVATION Variation of substitution rates across nucleotide and amino acid sites has long been recognized as a characteristic of molecular sequence evolution. Evolutionary models that account for this rate heterogeneity usually use a gamma density function to model the rate distribution across sites. This density function, however, may not fit real datasets, especially when there is a multimodal distribution of rates. Here, we present a novel evolutionary model based on a mixture of gamma density functions. This model better describes the among-site rate variation characteristic of molecular sequence evolution. The use of this model may improve the accuracy of various phylogenetic methods, such as reconstructing phylogenetic trees, dating divergence events, inferring ancestral sequences and detecting conserved sites in proteins. RESULTS Using diverse sets of protein sequences we show that the gamma mixture model better describes the stochastic process underlying protein evolution. We show that the proposed gamma mixture model fits protein datasets significantly better than the single-gamma model in 9 out of 10 datasets tested. We further show that using the gamma mixture model improves the accuracy of model-based prediction of conserved residues in proteins. AVAILABILITY C++ source codes are available from the authors upon request.
منابع مشابه
The Impact of Modelling Rate Heterogeneity among Sites on Phylogenetic Estimates of Intraspecific Evolutionary Rates and Timescales
Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable...
متن کاملA site- and time-heterogeneous model of amino acid replacement.
We combined the category (CAT) mixture model (Lartillot N, Philippe H. 2004) and the nonstationary break point (BP) model (Blanquart S, Lartillot N. 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, th...
متن کاملA dirichlet process covarion mixture model and its assessments using posterior predictive discrepancy tests.
Heterotachy, the variation of substitution rate at a site across time, is a prevalent phenomenon in nucleotide and amino acid alignments, which may mislead probabilistic-based phylogenetic inferences. The covarion model is a special case of heterotachy, in which sites change between the "ON" state (allowing substitutions according to any particular model of sequence evolution) and the "OFF" sta...
متن کاملExplaining Heterogeneity in Risk Preferences Using a Finite Mixture Model
This paper studies the effect of the space (distance) between lotteries' outcomes on risk-taking behavior and the shape of estimated utility and probability weighting functions. Previously investigated experimental data shows a significant space effect in the gain domain. As compared to low spaced lotteries, high spaced lotteries are associated with higher risk aversion for high probabilities o...
متن کاملUsing Regression based Control Limits and Probability Mixture Models for Monitoring Customer Behavior
In order to achieve the maximum flexibility in adaptation to ever changing customer’s expectations in customer relationship management, appropriate measures of customer behavior should be continually monitored. To this end, control charts adjusted for buyer’s/visitor’s prior intention to repurchase or visit again are suitable means taking into account the heterogeneity across customers. In the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 Suppl 2 شماره
صفحات -
تاریخ انتشار 2005